Our research topic is Trend of Major Types of Crimes commited by White Males in the DC Area in 2016-2021. We chose this topic because we are interested in the impact of COVID-19 on crimes. We will use the data provided by the Metropolitan Police of DC regarding adult arrests over a time period stretching between 2016-2021.
We read the data .CSV files of adults arrest in DC area from 2016-2021
df_2016<-data.frame(read.csv("Arrests 2016 Public.csv"))
df_2017<-data.frame(read.csv("Arrests 2017 Public.csv"))
df_2018<-data.frame(read.csv("Arrests by Year, 2018.csv"))
df_2019<-data.frame(read.csv("Arrests by Year, 2019.csv"))
df_2020<-data.frame(read.csv("Arrests by Year 2020.csv"))
df_2021<-data.frame(read.csv("2021 Adult Arrests.csv"))
c16 <- c(colnames(df_2016))
c18 <- c(colnames(df_2018))
The column names of the data in 2016 and 2017 were not the same with others. The below table shows the column names of the data in 2016 and the data in 2016.
| col # | 2016 | 2018 |
|---|---|---|
| 1 | Arrestee.Type | Arrestee.Type |
| 2 | Arrest.Year | Arrest.Year |
| 3 | Arrest.Date | Arrest.Date |
| 4 | Arrest.Hour | Arrest.Hour |
| 5 | CCN | CCN |
| 6 | Arrest.Number. | Arrest.Number. |
| 7 | Age | Age |
| 8 | Defendant.PSA | Defendant.PSA |
| 9 | Defendant.District | Defendant.District |
| 10 | Defendant.Race | Defendant.Race |
| 11 | Defendant.Ethnicity | Defendant.Ethnicity |
| 12 | Defendant.Sex | Defendant.Sex |
| 13 | Arrest.Category | Arrest.Category |
| 14 | Charge.Description | Charge.Description |
| 15 | Arrest.Location.PSA | Arrest.Location.PSA |
| 16 | Arrest.Location.District | Arrest.Location.District |
| 17 | Arrest.Location.Block.GeoX | Arrest.Block.GEOX |
| 18 | Arrest.Location.Block.GeoY | Arrest.Block.GEOY |
| 19 | Offense.GEOY | Arrest.Latitude |
| 20 | Offense.GEOX | Arrest.Longitude |
| 21 | Offense.PSA | Offense.Location.PSA |
| 22 | Offense.District | Offense.Location.District |
| 23 | Arrest.Latitude | Offense.Block.GEOX |
| 24 | Arrest.Longitude | Offense.Block.GEOY |
| 25 | Offense.Latitude | Offense.Latitude |
| 26 | Offense.Longitude | Offense.Longitude |
The column names were same from the first column to the 14th column
in both data. On the other hand, the name and order of 15th and latter
columns were a bit different in those data. The latter columns were
about locations, and we were not very interested in the detail location.
Therefore, we deleted the latter columns except for the 16th and 22nd
columns. In addition, we dropped CNN (col #5) and
Arrest.Number. (col #6) because they were IDs and useless
for our analysis.
The format of date was different from years; the data in 2016 and 2017 has the format like 2016-01-01, the data in 2018 to 2020 has the format like 1/1/2018, and the data in 2021 has the format like 2021/1/1. We coverted Since different date formats for different years are difficult to analyze, we will unify the date format to “yyyy-mm-dd”.
After deleting some columns and changing the date format, we binded data frames by rows.
# convert format
df_2018$Arrest.Date <- as.Date(df_2018$Arrest.Date, format = "%m/%d/%Y") %>% format()
df_2019$Arrest.Date <- as.Date(df_2019$Arrest.Date, format = "%m/%d/%Y") %>% format()
df_2020$Arrest.Date <- as.Date(df_2020$Arrest.Date, format = "%m/%d/%Y") %>% format()
df_2021$Arrest.Date <- as.Date(df_2021$Arrest.Date, format = "%Y/%m/%d") %>% format()
#bind df_2016 and df_2017, and delete some columns
df_16_17 <- rbind(df_2016, df_2017)[,-c(5,6,15,17:21,23:26)]
names(df_16_17)[c(13,14)] <- c('Arrest.Location.District','Offense.Location.District') #rename columns
#bind df_2018 - df_2021, and delete some columns
df_18_21 <- rbind(df_2018, df_2019, df_2020, df_2021)[,-c(5,6,15,17:21,23:26)]
DF<-rbind(df_16_17,df_18_21)
To see whether there were abnormal values, we created the table showing some statistics for numerical variables.
xkablesummary(subset(DF,select=c(Arrest.Year, Arrest.Hour, Age)))
| Arrest.Year | Arrest.Hour | Age | |
|---|---|---|---|
| Min | Min. :2016 | Min. : 0.00 | Min. : 18.00 |
| Q1 | 1st Qu.:2017 | 1st Qu.: 6.00 | 1st Qu.: 25.00 |
| Median | Median :2018 | Median :12.00 | Median : 32.00 |
| Mean | Mean :2018 | Mean :11.81 | Mean : 35.19 |
| Q3 | 3rd Qu.:2019 | 3rd Qu.:18.00 | 3rd Qu.: 43.00 |
| Max | Max. :2021 | Max. :23.00 | Max. :121.00 |
The maximum age was too old. 55 rows were assigned an age of over 100 years (117-121 ) in these data, and it seemed to be wrong. Therefore, we dropped these rows.
DF <- DF[!DF$Age>=100,]
Arrest.Category had some different values for 2021 and
other years:
Therefore, we coverted these values in 2021 into the correspond values in other years.
DF <- mutate(DF, Arrest.Category = gsub(Arrest.Category, pattern = "Release Violations/Fugitive.*", replacement = "Release Violations/Fugitive"))
DF <- mutate(DF, Arrest.Category = gsub(Arrest.Category, pattern = "Fraud and Financial Crimes.*", replacement = "Fraud and Financial Crimes"))
Since we were interested in crimes committed by while males, we
dropped rows where the value of Defendant.Race was not
“White”. The structure of the final data is shown in the below
table.
DF_WM <- subset(DF, subset = Defendant.Race=='WHITE' & Defendant.Sex=='MALE')
data.frame(column_name = names(DF_WM),
class = sapply(DF_WM, typeof),
first_values = sapply(DF_WM, function(x) paste0(head(x), collapse = ", ")),
row.names = NULL) %>%
kable("simple", caption = 'Data frame structure')
| column_name | class | first_values |
|---|---|---|
| Arrestee.Type | character | Adult Arrest, Adult Arrest, Adult Arrest, Adult Arrest, Adult Arrest, Adult Arrest |
| Arrest.Year | integer | 2016, 2016, 2016, 2016, 2016, 2016 |
| Arrest.Date | character | 2016-01-01, 2016-01-01, 2016-01-01, 2016-01-01, 2016-01-01, 2016-01-01 |
| Arrest.Hour | integer | 0, 0, 1, 1, 13, 2 |
| Age | integer | 39, 27, 27, 26, 48, 25 |
| Defendant.PSA | character | Out of State, Out of State, Out of State, Out of State, 404, Out of State |
| Defendant.District | character | Out of State, Out of State, Out of State, Out of State, 4D, Out of State |
| Defendant.Race | character | WHITE, WHITE, WHITE, WHITE, WHITE, WHITE |
| Defendant.Ethnicity | character | UNKNOWN, NOT HISPANIC, HISPANIC, NOT HISPANIC, NOT HISPANIC, HISPANIC |
| Defendant.Sex | character | MALE, MALE, MALE, MALE, MALE, MALE |
| Arrest.Category | character | Simple Assault, Simple Assault, Driving/Boating While Intoxicated, Simple Assault, Simple Assault, Simple Assault |
| Charge.Description | character | Threats To Do Bodily Harm -misd, Simple Assault, Driving While Intoxicated -2nd Off, Simple Assault, Simple Assault, Simple Assault |
| Arrest.Location.District | character | 2D, 3D, 4D, 5D, 1D, 3D |
| Offense.Location.District | character | 2D, 3D, 4D, 5D, 1D, 3D |
The number of crimes is as follows. Crime occurrences have decreased after COVID-19.
| year | the number of crimes |
|---|---|
| 2016 | 2620 |
| 2017 | 2636 |
| 2018 | 2297 |
| 2019 | 2191 |
| 2020 | 1425 |
| 2021 | 1109 |
| 2016 - 2019 (before COVID-19) | 2436 |
| 2020 - 2021 (after COVID-19) | 1267 |
We created some bar plots to see the number of occurrences per type
of crime.
The Bar plot of crimes in 2016 - 2021 is as follows:
ggplot(DF_WM, aes(forcats::fct_infreq(Arrest.Category))) +
ggtitle("Figure 1: Bar plot of crimes in 2016 - 2021") + xlab("crime types") + geom_bar() +
theme(axis.text.x = element_text(angle = 60, hjust = 1))
The Bar plots of crimes before COVID-19 (2016 - 2019) and after COVID-19 (2020 - 2021) are as follows:
ggplot(subset(DF_WM,Arrest.Year <= 2019), aes(forcats::fct_infreq(Arrest.Category))) +
ggtitle("Figure 2: Bar plot of crimes befor COVID-19") + xlab("crime types") + geom_bar() +
theme(axis.text.x = element_text(angle = 60, hjust = 1))
ggplot(subset(DF_WM,Arrest.Year > 2019), aes(forcats::fct_infreq(Arrest.Category))) +
ggtitle("Figure 3: Bar plot of crimes after COVID-19") + xlab("crime types") + geom_bar() +
theme(axis.text.x = element_text(angle = 60, hjust = 1))
The Bar plots of crimes in each year are as follows:
ggplot(subset(DF_WM,Arrest.Year == 2016), aes(forcats::fct_infreq(Arrest.Category))) +
ggtitle("Figure 4: Bar plot of crimes in 2016") + xlab("crime types") + geom_bar() +
theme(axis.text.x = element_text(angle = 60, hjust = 1))
ggplot(subset(DF_WM,Arrest.Year == 2017), aes(forcats::fct_infreq(Arrest.Category))) +
ggtitle("Figure 5: Bar plot of crimes in 2017") + xlab("crime types") + geom_bar() +
theme(axis.text.x = element_text(angle = 60, hjust = 1))
ggplot(subset(DF_WM,Arrest.Year == 2018), aes(forcats::fct_infreq(Arrest.Category))) +
ggtitle("Figure 6: Bar plot of crimes in 2018") + xlab("crime types") + geom_bar() +
theme(axis.text.x = element_text(angle = 60, hjust = 1))
ggplot(subset(DF_WM,Arrest.Year == 2019), aes(forcats::fct_infreq(Arrest.Category))) +
ggtitle("Figure 7: Bar plot of crimes in 2019") + xlab("crime types") + geom_bar() +
theme(axis.text.x = element_text(angle = 60, hjust = 1))
ggplot(subset(DF_WM,Arrest.Year == 2020), aes(forcats::fct_infreq(Arrest.Category))) +
ggtitle("Figure 8: Bar plot of crimes in 2020") + xlab("crime types") + geom_bar() +
theme(axis.text.x = element_text(angle = 60, hjust = 1))
ggplot(subset(DF_WM,Arrest.Year == 2021), aes(forcats::fct_infreq(Arrest.Category))) +
ggtitle("Figure 9: Bar plot of crimes in 2021") + xlab("crime types") + geom_bar() +
theme(axis.text.x = element_text(angle = 60, hjust = 1))
“Offenses Against Family & Children” have been increasing after COVD-19.
cnt_2016 <- table(subset(DF_WM,Arrest.Year==2016)$Arrest.Category)
pos_2016 <- order(cnt_2016, decreasing = TRUE)[1:6]
crime_2016 <- names(cnt_2016)[pos_2016]
cnt_2017 <- table(subset(DF_WM,Arrest.Year==2017)$Arrest.Category)
pos_2017 <- order(cnt_2017, decreasing = TRUE)[1:7]
crime_2017 <- names(cnt_2017)[pos_2017]
cnt_2018 <- table(subset(DF_WM,Arrest.Year==2018)$Arrest.Category)
pos_2018 <- order(cnt_2018, decreasing = TRUE)[1:6]
crime_2018 <- names(cnt_2018)[pos_2018]
cnt_2019 <- table(subset(DF_WM,Arrest.Year==2019)$Arrest.Category)
pos_2019 <- order(cnt_2019, decreasing = TRUE)[1:7]
crime_2019 <- names(cnt_2019)[pos_2019]
cnt_2020 <- table(subset(DF_WM,Arrest.Year==2020)$Arrest.Category)
pos_2020 <- order(cnt_2020, decreasing = TRUE)[1:7]
crime_2020 <- names(cnt_2020)[pos_2020]
cnt_2021 <- table(subset(DF_WM,Arrest.Year==2021)$Arrest.Category)
pos_2021 <- order(cnt_2021, decreasing = TRUE)[1:7]
crime_2021 <- names(cnt_2021)[pos_2021]
The top 6 crimes (or 7 crimes when ‘Other Crimes’ are included) in each year are as follows.
| Rank | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 |
|---|---|---|---|---|---|---|
| 1 | Simple Assault | Simple Assault | Simple Assault | Simple Assault | Simple Assault | Simple Assault |
| 2 | Traffic Violations | Traffic Violations | Traffic Violations | Traffic Violations | Driving/Boating While Intoxicated | Traffic Violations |
| 3 | Release Violations/Fugitive | Release Violations/Fugitive | Release Violations/Fugitive | Prostitution | Release Violations/Fugitive | Driving/Boating While Intoxicated |
| 4 | Driving/Boating While Intoxicated | Driving/Boating While Intoxicated | Driving/Boating While Intoxicated | Driving/Boating While Intoxicated | Traffic Violations | Release Violations/Fugitive |
| 5 | Liquor Law Violations | Other Crimes | Narcotics | Release Violations/Fugitive | Offenses Against Family & Children | Other Crimes |
| 6 | Narcotics | Disorderly Conduct | Theft | Other Crimes | Other Crimes | Offenses Against Family & Children |
| 7 | NA | Liquor Law Violations | NA | Theft | Narcotics | Damage to Property |
To see the trend of the above major crimes, we created a line plot as follows.
SA_cnt <- sapply(unique(DF_WM[DF_WM$Arrest.Category=='Simple Assault',]$Arrest.Year),
function(x){sum(DF_WM[DF_WM$Arrest.Category=='Simple Assault',]$Arrest.Year==x)})
TV_cnt <- sapply(unique(DF_WM[DF_WM$Arrest.Category=='Traffic Violations',]$Arrest.Year),
function(x){sum(DF_WM[DF_WM$Arrest.Category=='Traffic Violations',]$Arrest.Year==x)})
RV_cnt <- sapply(unique(DF_WM[DF_WM$Arrest.Category=='Release Violations/Fugitive',]$Arrest.Year),
function(x){sum(DF_WM[DF_WM$Arrest.Category=='Release Violations/Fugitive',]$Arrest.Year==x)})
DI_cnt <- sapply(unique(DF_WM[DF_WM$Arrest.Category=='Driving/Boating While Intoxicated',]$Arrest.Year),
function(x){sum(DF_WM[DF_WM$Arrest.Category=='Driving/Boating While Intoxicated',]$Arrest.Year==x)})
N_cnt <- sapply(unique(DF_WM[DF_WM$Arrest.Category=='Narcotics',]$Arrest.Year),
function(x){sum(DF_WM[DF_WM$Arrest.Category=='Narcotics',]$Arrest.Year==x)})
LV_cnt <- sapply(unique(DF_WM[DF_WM$Arrest.Category=='Liquor Law Violations',]$Arrest.Year),
function(x){sum(DF_WM[DF_WM$Arrest.Category=='Liquor Law Violations',]$Arrest.Year==x)})
T_cnt <- sapply(unique(DF_WM[DF_WM$Arrest.Category=='Theft',]$Arrest.Year),
function(x){sum(DF_WM[DF_WM$Arrest.Category=='Theft',]$Arrest.Year==x)})
DV_cnt <- sapply(unique(DF_WM[DF_WM$Arrest.Category=='Offenses Against Family & Children',]$Arrest.Year),
function(x){sum(DF_WM[DF_WM$Arrest.Category=='Offenses Against Family & Children',]$Arrest.Year==x)})
DC_cnt <- sapply(unique(DF_WM[DF_WM$Arrest.Category=='Disorderly Conduct',]$Arrest.Year),
function(x){sum(DF_WM[DF_WM$Arrest.Category=='Disorderly Conduct',]$Arrest.Year==x)})
P_cnt <- sapply(unique(DF_WM[DF_WM$Arrest.Category=='Prostitution',]$Arrest.Year),
function(x){sum(DF_WM[DF_WM$Arrest.Category=='Prostitution',]$Arrest.Year==x)})
DP_cnt <- sapply(unique(DF_WM[DF_WM$Arrest.Category=='Damage to Property',]$Arrest.Year),
function(x){sum(DF_WM[DF_WM$Arrest.Category=='Damage to Property',]$Arrest.Year==x)})
year_lst <- 2016:2021
major_crimes_df <- data.frame(year_lst, SA_cnt, TV_cnt, RV_cnt, DI_cnt, N_cnt, LV_cnt, T_cnt, DV_cnt, DC_cnt, P_cnt, DP_cnt)
colnames(major_crimes_df) <- c('Year', 'Simple Assault', 'Traffic Violations', 'Release Violations/Fugitive', 'Driving/Boating While Intoxicated',
'Narcotics', 'Liquor Law Violations', 'Theft', 'Offenses Against Family & Children', 'Disorderly Conduct', 'Prostitution', 'Damage to Property')
major_crimes_df2 <- major_crimes_df %>% gather(key = 'Crimes', value = "Count", -Year)
ggplot(data=major_crimes_df2, aes(x=Year, y=Count, color=Crimes)) +
geom_line() + geom_point()
“Simple Assault”, “Traffic Violations”, and “Theft” have clearly declined since 2020. On the other, “Offenses Against Family & Children” has increased in 2020 and 2021 compared to previous years. COVID-19 seems to be related to these trend change. We posed the following SMART QUESTION, and we will analyze these four crimes in detail in the following.
Is there a significant difference in “Simple Assault”, “Traffic Violations”, “Theft”, and “Offenses Against Family & Children” trends among adult white males within the DC area between 2016 and 2021, and could COVID protocols play a role in these trend shifts?
Since crime is likely to be a rare event, the number of occurrences per day of a given crime is expected to follow Poisson distribution. Poisson distribution is a distribution used to describe the distribution of the number of rare phenomena when a large number of them are observed. If a distribution follows Poisson distribution, and the average number of occurrences of the phenomenon is \(\lambda\), the probability that the phenomenon will occur \(x\) times is given by \[p(x) = \exp(-\lambda)\frac{\lambda^{x}}{x!}.\] In the following, we will estimate \(\lambda\) of each crime before and after COVID-19 to see there is a difference in crime trend.
The trend of “Offenses Against Family & Children,” Domestic Violence (DV), appears to have changed after COVID-19. The frequency table of DV before COVID-19 is as follows.
DF_WM_16_19 <- DF_WM[DF_WM$Arrest.Year%in%c(2016,2017,2018,2019),]
DF_WM_16_19_DV <- DF_WM_16_19[DF_WM_16_19$Arrest.Category=='Offenses Against Family & Children',]
# table of date and the number of occurrences
DV_day_16_19 <- sapply(unique(DF_WM_16_19_DV$Arrest.Date),
function(x){sum(DF_WM_16_19_DV$Arrest.Date==x)})
| # of occurrences per day | Frequency | Relative frequency |
|---|---|---|
| 0 | 1364 | 0.9336071 |
| 1 | 95 | 0.065024 |
| 2 | 2 | 0.0013689 |
| 3 | 0 | 0 |
We can calculate \(\lambda\) from the above table and \(\lambda = 0.0678\). We will plot the histogram and Poisson distribution with \(\lambda = 0.0678\) to check if they match or not.
x_DV <- 0:5
y_DV <- c(1364,95,2,0,0,0)
fx <- dpois(x=x_DV, lambda=99/(365*4+1))
data_DV <- data.frame(x_DV, y_DV, fx)
ggplot(data_DV, aes(x=x_DV,y=y_DV)) +
ggtitle("Figure 13: Histogram of DV in 2016 - 2019") +
xlab("Number of occurrences per day") +
ylab("Frequency") +
geom_bar(stat = "identity")
ggplot(data_DV) +
ggtitle("Figure 14: Relative frequency histogram of DV in 2016 - 2019 \n and Poisson distribution with lambda = 0.0678") +
xlab("Number of occurrences per day") +
ylab("Relative frequency") +
geom_bar(aes(x=x_DV,y=y_DV/(365*4+1)), stat = "identity") +
geom_line(aes(x=x_DV,y=fx), color='red') +
geom_point(aes(x=x_DV,y=fx), color='red')
We can see that the Poisson distribution fits well with the histogram.
Next, we try to estimate \(99\%\) Confidence Interval of \(\lambda\). The variance of Poisson distribution is equal to its mean (\(\lambda\)). Therefore, \(99\%\) Confidence Interval of \(\lambda\) can be written as \[ \bar{x} - z_{*}\cdot\sqrt{\frac{\bar{x}}{n}} \leq \lambda \leq \bar{x} + z_{*}\cdot\sqrt{\frac{\bar{x}}{n}}, \] where \(\bar{x}\) is the sample mean, \(n\) is the sample size, and \(z_*\) is z-value corresponding to the \(99\%\) confidence interval, and the value is 2.58. From this expression, 99% Confidence Interval of \(\lambda\) for DV before COVID-19 is [0.05, 0.0856].
The frequency table of DV after COVID-19 is as follows.
DF_WM_20_21 <- DF_WM[DF_WM$Arrest.Year%in%c(2020,2021),]
DF_WM_20_21_DV <- DF_WM_20_21[DF_WM_20_21$Arrest.Category=='Offenses Against Family & Children',]
# table of date and the number of occurrences
DV_day_20_21 <- sapply(unique(DF_WM_20_21_DV$Arrest.Date),
function(x){sum(DF_WM_20_21_DV$Arrest.Date==x)})
| # of occurrences per day | Frequency | Relative frequency |
|---|---|---|
| 0 | 680 | 0.9302326 |
| 1 | 47 | 0.0642955 |
| 2 | 1 | 0.001368 |
| 3 | 0 | 0 |
| 4 | 1 | 0.001368 |
| 5 | 0 | 0 |
| … | 0 | 0 |
| 44 | 0 | 0 |
| 45 | 1 | 0.001368 |
| 46 | 0 | 0 |
| … | 0 | 0 |
| 77 | 0 | 0 |
| 78 | 1 | 0.001368 |
| 79 | 0 | 0 |
| … | 0 | 0 |
There are two outliers (45 and 78) in the table. The dates of them are 2021-01-06 and 2020-06-01. Since these dates are correspond to “Capitol attack” and “George Floyd protests”, we will drop the value of these dates.
The calculated \(\lambda = 0.0725\). The histogram and the poisson distribution with \(\lambda = 0.0725\) are shown in Figure 16.
x_DV <- 0:5
y_DV <- c(680,47,1,0,1,0)
fx <- dpois(x=x_DV, lambda=53/(365*2+1))
data_DV <- data.frame(x_DV, y_DV, fx)
ggplot(data_DV, aes(x=x_DV,y=y_DV)) +
ggtitle("Figure 15: Histogram of DV in 2020 - 2021") +
xlab("Number of occurrences per day") +
ylab("Frequency") +
geom_bar(stat = "identity")
ggplot(data_DV) +
ggtitle("Figure 16: Reralive frequency histogram of DV in 2020 - 2021 \n and Pission distribution with lambda = 0.0725") +
xlab("Number of occurrences per day") +
ylab("Relative frequency") +
geom_bar(aes(x=x_DV,y=y_DV/(365*2+1)), stat = "identity") +
geom_line(aes(x=x_DV,y=fx), color='red') +
geom_point(aes(x=x_DV,y=fx), color='red')
The Poisson distribution fits well with the histogram.
99% Confidence Interval of \(\lambda\) for DV after COVID-19 is [0.0465, 0.0985].
Figure 17 shows the Confidence Intervals before and after COVID-19. There was overlap in the Confidence Intervals, and it is not possible to say that there was a change in the \(\lambda\) of “Offenses Against Family & Children” before or after COVID-19.
x <- c('Before COVID-19','After COVID-19')
pre_covid_interval <- c(99/(365*4+1)-2.58*(99/(365*4+1)/(356*4+1))**0.5, 99/(365*4+1)+ 2.58*(99/(365*4+1)/(356*4+1))**0.5)
post_covid_interval <- c(53/(365*2+1) - 2.58*(53/(365*2+1)/(356*2+1))**0.5, 53/(365*2+1) + 2.58*(53/(365*2+1)/(356*2+1))**0.5)
data_CI_DV <- data.frame(x,pre_covid_interval,post_covid_interval)
ggplot(data_CI_DV) +
ggtitle("Figure 17: 99% Confidence Interval of lambda for DV") +
xlab("") +
ylab("99% Confidence Interval of lambda") +
coord_flip() +
geom_linerange(aes(x=x[1], ymin=pre_covid_interval[1], ymax=pre_covid_interval[2])) +
geom_linerange(aes(x=x[2], ymin=post_covid_interval[1], ymax=post_covid_interval[2]))
The trend of “Traffic Violations” also appears to have changed after COVID-19. The frequency table of Traffic Violations before COVID-19 is as follows.
DF_WM_16_19_TV <- DF_WM_16_19[DF_WM_16_19$Arrest.Category=='Traffic Violations',]
# table of date and the number of occurrences
TV_day_16_19 <- sapply(unique(DF_WM_16_19_TV$Arrest.Date),
function(x){sum(DF_WM_16_19_TV$Arrest.Date==x)})
| # of occurrences per day | Frequency | Relative frequency |
|---|---|---|
| 0 | 602 | 0.4123288 |
| 1 | 530 | 0.3627652 |
| 2 | 225 | 0.1540041 |
| 3 | 77 | 0.0527036 |
| 4 | 22 | 0.0150582 |
| 5 | 4 | 0.0027379 |
| 6 | 1 | 6.844627^{-4} |
| 7 | 0 | 0 |
The calculated \(\lambda = 0.907\). The histogram and the poisson distribution with \(\lambda = 0.907\) are shown in Figure 19.
x_TV <- 0:10
y_TV <- c(602,530,225,77,22,4,1,0,0,0,0)
fx <- dpois(x=x_TV, lambda=sum(TV_day_16_19)/(365*4+1))
data_TV <- data.frame(x_TV, y_TV, fx)
ggplot(data_TV, aes(x=x_TV,y=y_TV)) +
ggtitle("Figure 18: Histogram of traffic violations in 2016 - 2019") +
xlab("Number of occurrences per day") +
ylab("Frequency") +
geom_bar(stat = "identity")
ggplot(data_TV) +
ggtitle("Figure 19: Relative frequency histogram of traffic violations in 2016 - 2019 \n and Pission distribution with lambda = 0.907") +
xlab("Number of occurrences per day") +
ylab("Relative frequency") +
geom_bar(aes(x=x_TV,y=y_TV/(365*4+1)), stat = "identity") +
geom_line(aes(x=x_TV,y=fx), color="red") +
geom_point(aes(x=x_TV,y=fx), color='red')
The Poisson distribution fits well with the histogram.
99% Confidence Interval of \(\lambda\) for Traffic Violations before COVID-19 is [0.842, 0.972].
The frequency table of Traffic Violations after COVID-19 is as follows.
DF_WM_20_21_TV <- DF_WM_20_21[DF_WM_20_21$Arrest.Category=='Traffic Violations',]
# table of date and the number of occurrences
TV_day_20_21 <- sapply(unique(DF_WM_20_21_TV$Arrest.Date),
function(x){sum(DF_WM_20_21_TV$Arrest.Date==x)})
| # of occurrences per day | Frequency | Relative frequency |
|---|---|---|
| 0 | 546 | 0.746922 |
| 1 | 156 | 0.2134063 |
| 2 | 23 | 0.0314637 |
| 3 | 3 | 0.004104 |
| 4 | 3 | 0.004104 |
| 5 | 0 | 0 |
The calculated \(\lambda = 0.306\). The histogram and the poisson distribution with \(\lambda = 0.306\) are shown in Figure 21.
x_TV <- 0:10
y_TV <- c(546,156,23,3,3,0,0,0,0,0,0)
fx <- dpois(x=x_TV, lambda=sum(TV_day_20_21)/(365*2+1))
data_TV <- data.frame(x_TV, y_TV, fx)
ggplot(data_TV, aes(x=x_TV,y=y_TV)) +
ggtitle("Figure 20: Histogram of traffic violations in 2020 - 2021") +
xlab("Number of occurrences per day") +
ylab("Frequency") +
geom_bar(stat = "identity")
ggplot(data_TV) +
ggtitle("Figure 21: Relative frequency histogram of traffic violations in 2020 - 2021 \n and Pission distribution with lambda = 0.306") +
xlab("Number of occurrences per day") +
ylab("Relative frequency") +
geom_bar(aes(x=x_TV,y=y_TV/(365*2+1)), stat = "identity") +
geom_line(aes(x=x_TV,y=fx), color="red") +
geom_point(aes(x=x_TV,y=fx), color='red')
The Poisson distribution fits well with the histogram.
99% Confidence Interval of \(\lambda\) for Traffic Violations before COVID-19 is [0.252, 0.358].
Figure 22 shows the Confidence Intervals before and after COVID-19. There was no overlap in the Confidence Intervals, and there may have been a change in the Traffic Violations lambda before and after COVID-19.
x <- c('Before COVID-19','After COVID-19')
pre_covid_interval <- c(sum(TV_day_16_19)/(365*4+1) - 2.58*(sum(TV_day_16_19)/(365*4+1)/(356*4+1))**0.5,
sum(TV_day_16_19)/(365*4+1) + 2.58*(sum(TV_day_16_19)/(365*4+1)/(356*4+1))**0.5)
post_covid_interval <- c(sum(TV_day_20_21)/(365*2+1) - 2.58*(sum(TV_day_20_21)/(365*2+1)/(356*2+1))**0.5,
sum(TV_day_20_21)/(365*2+1) + 2.58*(sum(TV_day_20_21)/(365*2+1)/(356*2+1))**0.5)
data_CI_TV <- data.frame(x,pre_covid_interval,post_covid_interval)
ggplot(data_CI_TV) +
ggtitle("Figure 22: 99% Confidence Interval of lambda for Traffic Violations") +
ylab("99% Confidence Interval of lambda") +
xlab("") +
coord_flip() +
geom_linerange(aes(x=x[1], ymin=pre_covid_interval[1], ymax=pre_covid_interval[2])) +
geom_linerange(aes(x=x[2], ymin=post_covid_interval[1], ymax=post_covid_interval[2]))
The below table shows the frequency and relative frequency of Simple Assault before COVID-19.
DF_WM_16_19_SA <- DF_WM_16_19[DF_WM_16_19$Arrest.Category=='Simple Assault',]
# table of date and the number of occurrences
SA_day_16_19 <- sapply(unique(DF_WM_16_19_SA$Arrest.Date),
function(x){sum(DF_WM_16_19_SA$Arrest.Date==x)})
| # of occurrences per day | Frequency | Relative frequency |
|---|---|---|
| 0 | 438 | 0.2997947 |
| 1 | 479 | 0.3278576 |
| 2 | 284 | 0.1943874 |
| 3 | 156 | 0.1067762 |
| 4 | 76 | 0.0520192 |
| 5 | 13 | 0.008898 |
| 6 | 8 | 0.0054757 |
| 7 | 4 | 0.0027379 |
| 8 | 1 | 6.844627^{-4} |
| 9 | 2 | 0.0013689 |
| 10 | 0 | 0 |
We got \(\lambda = 1.36\) by calculating the average of occurrences per day.
x_SA <- 0:10
y_SA <- c(438,479,284,156,76,13,8,4,1,2,0)
fx <- dpois(x=x_SA, lambda=sum(SA_day_16_19)/(365*4+1))
data_SA <- data.frame(x_SA, y_SA, fx)
ggplot(data_SA, aes(x=x_SA,y=y_SA)) +
ggtitle("Figure 23: Histogram of Simple Assault in 2016 - 2019") +
xlab("Number of occurrences per day") +
ylab("Frequency") +
geom_bar(stat = "identity")
ggplot(data_SA) +
ggtitle("Figure 24: Relative frequency histogram of Simple Assault in 2016 - 2019 \n and Poisson distribution with lambda = 1.36") +
xlab("Number of occurrences per day") +
ylab("Relative frequency") +
geom_bar(aes(x=x_SA,y=y_SA/(365*4+1)), stat = "identity") +
geom_line(aes(x=x_SA,y=fx), color='red') +
geom_point(aes(x=x_SA,y=fx), color='red')
The frequency and relative frequency in 2020 and 2021 is shown in below. The \(\lambda\) for 2020 and 2021 was \(0.923\).
DF_WM_20_21_SA <- DF_WM_20_21[DF_WM_20_21$Arrest.Category=='Simple Assault',]
# table of date and the number of occurrences
SA_day_20_21 <- sapply(unique(DF_WM_20_21_SA$Arrest.Date),
function(x){sum(DF_WM_20_21_SA$Arrest.Date==x)})
| # of occurrences per day | Frequency | Relative frequency |
|---|---|---|
| 0 | 297 | 0.4062927 |
| 1 | 267 | 0.3652531 |
| 2 | 112 | 0.1532148 |
| 3 | 43 | 0.0588235 |
| 4 | 8 | 0.0109439 |
| 5 | 3 | 0.004104 |
| 6 | 0 | 0 |
| 7 | 0 | 0 |
| 8 | 1 | 0.001368 |
| 9 | 0 | 0 |
x_SA <- 0:10
y_SA <- c(297,267,112,43,8,3,0,0,1,0,0)
fx <- dpois(x=x_SA, lambda=sum(SA_day_20_21)/(365*2+1))
data_SA <- data.frame(x_SA, y_SA, fx)
ggplot(data_SA, aes(x=x_SA,y=y_SA)) +
ggtitle("Figure 25: Histogram of Simple Assault in 2020 - 2021") +
xlab("Number of occurrences per day") +
ylab("Frequency") +
geom_bar(stat = "identity")
ggplot(data_SA) +
ggtitle("Figure 25: Relative frequency histogram of Simple Assault in 2020 - 2021 \n and Poisson distribution with lambda = 0.923") +
xlab("Number of occurrences per day") +
ylab("Relative frequency") +
geom_bar(aes(x=x_SA,y=y_SA/(365*2+1)), stat = "identity") +
geom_line(aes(x=x_SA,y=fx), color='red') +
geom_point(aes(x=x_SA,y=fx), color='red')
Figure 26 shows the Confidence Intervals before and after COVID-19. There was no overlap in the Confidence Intervals, and there may have been a change in the Simple Assault lambda before and after COVID-19.
x <- c('Before COVID-19','After COVID-19')
pre_covid_interval <- c(sum(SA_day_16_19)/(365*4+1) - 2.58*(sum(SA_day_16_19)/(365*4+1)/(356*4+1))**0.5,
sum(SA_day_16_19)/(365*4+1) + 2.58*(sum(SA_day_16_19)/(365*4+1)/(356*4+1))**0.5)
post_covid_interval <- c(sum(SA_day_20_21)/(365*2+1) - 2.58*(sum(SA_day_20_21)/(365*2+1)/(356*2+1))**0.5,
sum(SA_day_20_21)/(365*2+1) + 2.58*(sum(SA_day_20_21)/(365*2+1)/(356*2+1))**0.5)
data_CI_SA <- data.frame(x,pre_covid_interval,post_covid_interval)
ggplot(data_CI_SA) +
ggtitle("Figure 26: 99% Confidence Interval of lambda for Simple Assault") +
ylab("99% Confidence Interval of lambda") +
xlab("") +
coord_flip() +
geom_linerange(aes(x=x[1], ymin=pre_covid_interval[1], ymax=pre_covid_interval[2])) +
geom_linerange(aes(x=x[2], ymin=post_covid_interval[1], ymax=post_covid_interval[2]))
The frequency and relative frequency in 2016 to 2019 is shown in below. The \(\lambda\) before COVID-19 was \(0.404\).
DF_WM_16_19_T <- DF_WM_16_19[DF_WM_16_19$Arrest.Category=='Theft',]
# table of date and the number of occurrences
T_day_16_19 <- sapply(unique(DF_WM_16_19_T$Arrest.Date),
function(x){sum(DF_WM_16_19_T$Arrest.Date==x)})
| # of occurrences per day | Frequency | Relative frequency |
|---|---|---|
| 0 | 973 | 0.6659822 |
| 1 | 398 | 0.2724162 |
| 2 | 79 | 0.0540726 |
| 3 | 10 | 0.0068446 |
| 4 | 1 | 6.844627^{-4} |
| 5 | 0 | 0 |
x_T <- 0:5
y_T <- c(973,398,79,10,1,0)
fx <- dpois(x=x_T, lambda=sum(T_day_16_19)/(365*4+1))
data_T <- data.frame(x_T, y_T, fx)
ggplot(data_T, aes(x=x_T,y=y_T)) +
ggtitle("Figure 27: Histogram of Theft in 2016 - 2019") +
xlab("Number of occurrences per day") +
ylab("Frequency") +
geom_bar(stat = "identity")
ggplot(data_T) +
ggtitle("Figure 28: Relative frequency histogram of Theft in 2016 - 2019 \n and Poisson distribution with lambda = 0.404") +
xlab("Number of occurrences per day") +
ylab("Relative frequency") +
geom_bar(aes(x=x_T,y=y_T/(365*4+1)), stat = "identity") +
geom_line(aes(x=x_T,y=fx), color='red') +
geom_point(aes(x=x_T,y=fx), color='red')
The frequency and relative frequency in 2020 and 2021 are shown in below. The \(\lambda\) for 2020 and 2021 was \(0.115\).
DF_WM_20_21_T <- DF_WM_20_21[DF_WM_20_21$Arrest.Category=='Theft',]
# table of date and the number of occurrences
T_day_20_21 <- sapply(unique(DF_WM_20_21_T$Arrest.Date),
function(x){sum(DF_WM_20_21_T$Arrest.Date==x)})
| # of occurrences per day | Frequency | Relative frequency |
|---|---|---|
| 0 | 653 | 0.8932969 |
| 1 | 72 | 0.0984952 |
| 2 | 6 | 0.0082079 |
| 3 | 0 | 0 |
x_T <- 0:3
y_T <- c(653,72,6,0)
fx <- dpois(x=x_T, lambda=sum(T_day_20_21)/(365*2+1))
data_T <- data.frame(x_T, y_T, fx)
ggplot(data_T, aes(x=x_T,y=y_T)) +
ggtitle("Figure 29: Histogram of Theft in 2020 - 2021") +
xlab("Number of occurrences per day") +
ylab("Frequency") +
geom_bar(stat = "identity")
ggplot(data_T) +
ggtitle("Figure 30: Relative frequency histogram of Theft in 2020 - 2021 \n and Poisson distribution with lambda = 0.115") +
xlab("Number of occurrences per day") +
ylab("Relative frequency") +
geom_bar(aes(x=x_T,y=y_T/(365*2+1)), stat = "identity") +
geom_line(aes(x=x_T,y=fx), color='red') +
geom_point(aes(x=x_T,y=fx), color='red')
Figure 31 shows the Confidence Intervals before and after COVID-19. There was no overlap in the Confidence Intervals, and there may have been a change in the Theft lambda before and after COVID-19.
x <- c('Before COVID-19','After COVID-19')
pre_covid_interval <- c(sum(T_day_16_19)/(365*4+1) - 2.58*(sum(T_day_16_19)/(365*4+1)/(356*4+1))**0.5,
sum(T_day_16_19)/(365*4+1) + 2.58*(sum(T_day_16_19)/(365*4+1)/(356*4+1))**0.5)
post_covid_interval <- c(sum(T_day_20_21)/(365*2+1) - 2.58*(sum(T_day_20_21)/(365*2+1)/(356*2+1))**0.5,
sum(T_day_20_21)/(365*2+1) + 2.58*(sum(T_day_20_21)/(365*2+1)/(356*2+1))**0.5)
data_CI_T <- data.frame(x,pre_covid_interval,post_covid_interval)
ggplot(data_CI_T) +
ggtitle("Figure 31: 99% Confidence Interval of lambda for Theft") +
ylab("99% Confidence Interval of lambda") +
xlab("") +
coord_flip() +
geom_linerange(aes(x=x[1], ymin=pre_covid_interval[1], ymax=pre_covid_interval[2])) +
geom_linerange(aes(x=x[2], ymin=post_covid_interval[1], ymax=post_covid_interval[2]))
Statistically significant reductions in Simple Assault and Traffic Violations were observed for \(\lambda\) before and after COVID-19. Since these crimes seem to be more likely to occur the more people are out, it is likely that the restrictions and curbs on going out due to COVID-19 contributed to the decrease in these crimes.
A statistically significant decrease in theft was also observed in \(\lambda\) before and after Corona. Considering that thefts are committed against empty homes, the decrease in empty homes due to the curfew restrictions caused by COVID-19 may have contributed to the decrease in thefts.
The more time one spends at home due, the more Offenses Against Family & Children are likely to increase. In fact, in terms of the number of cases alone, Offenses Against Family & Children have increased after COVID-19. At first glance, the curfew restrictions caused by COVID-19 seems to be the cause. However, most of these cases were caused by special incidents unrelated to COVID-19, and when these effects were removed, there was no statistically significant difference in the change in Offenses Against Family & Children before and after COVID-19. As for white males in the DC area, Offenses Against Family & Children to the point of arrest does not appear to be affected by the changes in their lives caused by COVID-19.
# table of date and the number of occurrences
SA_day_16 <- sapply(unique(DF_WM_16_19_SA[DF_WM_16_19_SA$Arrest.Year==2016,]$Arrest.Date),
function(x){sum(DF_WM_16_19_SA[DF_WM_16_19_SA$Arrest.Year==2016,]$Arrest.Date==x)})
SA_day_17 <- sapply(unique(DF_WM_16_19_SA[DF_WM_16_19_SA$Arrest.Year==2017,]$Arrest.Date),
function(x){sum(DF_WM_16_19_SA[DF_WM_16_19_SA$Arrest.Year==2017,]$Arrest.Date==x)})
SA_day_18 <- sapply(unique(DF_WM_16_19_SA[DF_WM_16_19_SA$Arrest.Year==2018,]$Arrest.Date),
function(x){sum(DF_WM_16_19_SA[DF_WM_16_19_SA$Arrest.Year==2018,]$Arrest.Date==x)})
SA_day_19 <- sapply(unique(DF_WM_16_19_SA[DF_WM_16_19_SA$Arrest.Year==2019,]$Arrest.Date),
function(x){sum(DF_WM_16_19_SA[DF_WM_16_19_SA$Arrest.Year==2019,]$Arrest.Date==x)})
# 2016
x_SA <- 0:10
y_SA <- c(365+1-length(SA_day_16),length(SA_day_16[SA_day_16==1]),length(SA_day_16[SA_day_16==2]),
length(SA_day_16[SA_day_16==3]),length(SA_day_16[SA_day_16==4]),
length(SA_day_16[SA_day_16==5]),length(SA_day_16[SA_day_16==6]),
length(SA_day_16[SA_day_16==7]),length(SA_day_16[SA_day_16==8]),
length(SA_day_16[SA_day_16==9]),length(SA_day_16[SA_day_16==10]))
fx <- dpois(x=x_SA, lambda=sum(SA_day_16)/(365+1))
data_SA <- data.frame(x_SA, y_SA, fx)
ggplot(data_SA) +
ggtitle("Figure: Relative frequency histogram of Simple Assault in 2016
\n and Poisson distribution with lambda = 1.5") +
xlab("Number of occurrences per day") +
ylab("Relative frequency") +
geom_bar(aes(x=x_SA,y=y_SA/(365+1)), stat = "identity") +
geom_line(aes(x=x_SA,y=fx), color='red') +
geom_point(aes(x=x_SA,y=fx), color='red')
# 2017
x_SA <- 0:10
y_SA <- c(365-length(SA_day_17),length(SA_day_17[SA_day_17==1]),length(SA_day_17[SA_day_17==2]),
length(SA_day_17[SA_day_17==3]),length(SA_day_17[SA_day_17==4]),
length(SA_day_17[SA_day_17==5]),length(SA_day_17[SA_day_17==6]),
length(SA_day_17[SA_day_17==7]),length(SA_day_17[SA_day_17==8]),
length(SA_day_17[SA_day_17==9]),length(SA_day_17[SA_day_17==10]))
fx <- dpois(x=x_SA, lambda=sum(SA_day_17)/365)
data_SA <- data.frame(x_SA, y_SA, fx)
ggplot(data_SA) +
ggtitle("Figure: Relative frequency histogram of Simple Assault in 2017
\n and Poisson distribution with lambda = 1.33") +
xlab("Number of occurrences per day") +
ylab("Relative frequency") +
geom_bar(aes(x=x_SA,y=y_SA/365), stat = "identity") +
geom_line(aes(x=x_SA,y=fx), color='red') +
geom_point(aes(x=x_SA,y=fx), color='red')
# 2018
x_SA <- 0:10
y_SA <- c(365-length(SA_day_18),length(SA_day_18[SA_day_18==1]),length(SA_day_18[SA_day_18==2]),
length(SA_day_18[SA_day_18==3]),length(SA_day_18[SA_day_18==4]),
length(SA_day_18[SA_day_18==5]),length(SA_day_18[SA_day_18==6]),
length(SA_day_18[SA_day_18==7]),length(SA_day_18[SA_day_18==8]),
length(SA_day_18[SA_day_18==9]),length(SA_day_18[SA_day_18==10]))
fx <- dpois(x=x_SA, lambda=sum(SA_day_18)/365)
data_SA <- data.frame(x_SA, y_SA, fx)
ggplot(data_SA) +
ggtitle("Figure: Relative frequency histogram of Simple Assault in 2018
\n and Poisson distribution with lambda = 1.32") +
xlab("Number of occurrences per day") +
ylab("Relative frequency") +
geom_bar(aes(x=x_SA,y=y_SA/365), stat = "identity") +
geom_line(aes(x=x_SA,y=fx), color='red') +
geom_point(aes(x=x_SA,y=fx), color='red')
# 2019
x_SA <- 0:10
y_SA <- c(365-length(SA_day_19),length(SA_day_19[SA_day_19==1]),length(SA_day_19[SA_day_19==2]),
length(SA_day_19[SA_day_19==3]),length(SA_day_19[SA_day_19==4]),
length(SA_day_19[SA_day_19==5]),length(SA_day_19[SA_day_19==6]),
length(SA_day_19[SA_day_19==7]),length(SA_day_19[SA_day_19==8]),
length(SA_day_19[SA_day_19==9]),length(SA_day_19[SA_day_19==10]))
fx <- dpois(x=x_SA, lambda=sum(SA_day_19)/365)
data_SA <- data.frame(x_SA, y_SA, fx)
ggplot(data_SA) +
ggtitle("Figure: Relative frequency histogram of Simple Assault in 2019
\n and Poisson distribution with lambda = 1.3") +
xlab("Number of occurrences per day") +
ylab("Relative frequency") +
geom_bar(aes(x=x_SA,y=y_SA/365), stat = "identity") +
geom_line(aes(x=x_SA,y=fx), color='red') +
geom_point(aes(x=x_SA,y=fx), color='red')
# table of date and the number of occurrences
SA_day_20 <- sapply(unique(DF_WM_20_21_SA[DF_WM_20_21_SA$Arrest.Year==2020,]$Arrest.Date),
function(x){sum(DF_WM_20_21_SA[DF_WM_20_21_SA$Arrest.Year==2020,]$Arrest.Date==x)})
SA_day_21 <- sapply(unique(DF_WM_20_21_SA[DF_WM_20_21_SA$Arrest.Year==2021,]$Arrest.Date),
function(x){sum(DF_WM_20_21_SA[DF_WM_20_21_SA$Arrest.Year==2021,]$Arrest.Date==x)})
# 2020
x_SA <- 0:10
y_SA <- c(365+1-length(SA_day_20),length(SA_day_20[SA_day_20==1]),length(SA_day_20[SA_day_20==2]),
length(SA_day_20[SA_day_20==3]),length(SA_day_20[SA_day_20==4]),
length(SA_day_20[SA_day_20==5]),length(SA_day_20[SA_day_20==6]),
length(SA_day_20[SA_day_20==7]),length(SA_day_20[SA_day_20==8]),
length(SA_day_20[SA_day_20==9]),length(SA_day_20[SA_day_20==10]))
fx <- dpois(x=x_SA, lambda=sum(SA_day_20)/(365+1))
data_SA <- data.frame(x_SA, y_SA, fx)
ggplot(data_SA) +
ggtitle("Figure: Relative frequency histogram of Simple Assault in 2020
\n and Poisson distribution with lambda = 0.954") +
xlab("Number of occurrences per day") +
ylab("Relative frequency") +
geom_bar(aes(x=x_SA,y=y_SA/(365+1)), stat = "identity") +
geom_line(aes(x=x_SA,y=fx), color='red') +
geom_point(aes(x=x_SA,y=fx), color='red')
# 2021
x_SA <- 0:10
y_SA <- c(365-length(SA_day_21),length(SA_day_21[SA_day_21==1]),length(SA_day_21[SA_day_21==2]),
length(SA_day_21[SA_day_21==3]),length(SA_day_21[SA_day_21==4]),
length(SA_day_21[SA_day_21==5]),length(SA_day_21[SA_day_21==6]),
length(SA_day_21[SA_day_21==7]),length(SA_day_21[SA_day_21==8]),
length(SA_day_21[SA_day_21==9]),length(SA_day_21[SA_day_21==10]))
fx <- dpois(x=x_SA, lambda=sum(SA_day_21)/365)
data_SA <- data.frame(x_SA, y_SA, fx)
ggplot(data_SA) +
ggtitle("Figure: Relative frequency histogram of Simple Assault in 2021
\n and Poisson distribution with lambda = 0.893") +
xlab("Number of occurrences per day") +
ylab("Relative frequency") +
geom_bar(aes(x=x_SA,y=y_SA/365), stat = "identity") +
geom_line(aes(x=x_SA,y=fx), color='red') +
geom_point(aes(x=x_SA,y=fx), color='red')
x <- c('2016','2017','2018','2019','2020','2021')
interval_2016 <- c(sum(SA_day_16)/(365+1) - 2.58*(sum(SA_day_16)/(365+1)/(356+1))**0.5,
sum(SA_day_16)/(365+1) + 2.58*(sum(SA_day_16)/(365+1)/(356+1))**0.5)
interval_2017 <- c(sum(SA_day_17)/(365) - 2.58*(sum(SA_day_17)/(365)/(356))**0.5,
sum(SA_day_17)/(365) + 2.58*(sum(SA_day_17)/(365)/(356))**0.5)
interval_2018 <- c(sum(SA_day_18)/(365) - 2.58*(sum(SA_day_18)/(365)/(356))**0.5,
sum(SA_day_18)/(365) + 2.58*(sum(SA_day_18)/(365)/(356))**0.5)
interval_2019 <- c(sum(SA_day_19)/(365) - 2.58*(sum(SA_day_19)/(365)/(356))**0.5,
sum(SA_day_19)/(365) + 2.58*(sum(SA_day_19)/(365)/(356))**0.5)
interval_2020 <- c(sum(SA_day_20)/(365+1) - 2.58*(sum(SA_day_20)/(365+1)/(356+1))**0.5,
sum(SA_day_20)/(365+1) + 2.58*(sum(SA_day_20)/(365+1)/(356+1))**0.5)
interval_2021 <- c(sum(SA_day_21)/(365) - 2.58*(sum(SA_day_21)/(365)/(356))**0.5,
sum(SA_day_21)/(365) + 2.58*(sum(SA_day_21)/(365)/(356))**0.5)
data_CI_SA <- data.frame(x,interval_2016,interval_2017,interval_2018,interval_2019,interval_2020,interval_2021)
ggplot(data_CI_SA) +
ggtitle("Figure: 99% Confidence Interval of lambda for Simple Assault") +
ylab("99% Confidence Interval of lambda") +
xlab("") +
coord_flip() +
geom_linerange(aes(x=x[1], ymin=interval_2016[1], ymax=interval_2016[2])) +
geom_linerange(aes(x=x[2], ymin=interval_2017[1], ymax=interval_2017[2])) +
geom_linerange(aes(x=x[3], ymin=interval_2018[1], ymax=interval_2018[2])) +
geom_linerange(aes(x=x[4], ymin=interval_2019[1], ymax=interval_2019[2])) +
geom_linerange(aes(x=x[5], ymin=interval_2020[1], ymax=interval_2020[2])) +
geom_linerange(aes(x=x[6], ymin=interval_2021[1], ymax=interval_2021[2]))
mean16 = sum(SA_day_16)/(365+1)
mean17 = sum(SA_day_17)/(365)
mean18 = sum(SA_day_18)/(365)
mean19 = sum(SA_day_19)/(365)
mean20 = sum(SA_day_20)/(365+1)
mean21 = sum(SA_day_21)/(365)
var16 = (((0-mean16)**2)*(366 - length(SA_day_16)) + ((1-mean16)**2)*length(SA_day_16[SA_day_16==1]) +
((2-mean16)**2)*length(SA_day_16[SA_day_16==2]) + ((3-mean16)**2)*length(SA_day_16[SA_day_16==3]) +
((4-mean16)**2)*length(SA_day_16[SA_day_16==4]) + ((5-mean16)**2)*length(SA_day_16[SA_day_16==5]) +
((6-mean16)**2)*length(SA_day_16[SA_day_16==6]) + ((7-mean16)**2)*length(SA_day_16[SA_day_16==7]) +
((8-mean16)**2)*length(SA_day_16[SA_day_16==8]) + ((9-mean16)**2)*length(SA_day_16[SA_day_16==9]))/365
var17 = (((0-mean17)**2)*(365 - length(SA_day_17)) + ((1-mean17)**2)*length(SA_day_17[SA_day_17==1]) +
((2-mean17)**2)*length(SA_day_17[SA_day_17==2]) + ((3-mean17)**2)*length(SA_day_17[SA_day_17==3]) +
((4-mean17)**2)*length(SA_day_17[SA_day_17==4]) + ((5-mean17)**2)*length(SA_day_17[SA_day_17==5]) +
((6-mean17)**2)*length(SA_day_17[SA_day_17==6]) + ((7-mean17)**2)*length(SA_day_17[SA_day_17==7]) +
((8-mean17)**2)*length(SA_day_17[SA_day_17==8]) + ((9-mean17)**2)*length(SA_day_17[SA_day_17==9]))/364
var18 = (((0-mean18)**2)*(366 - length(SA_day_18)) + ((1-mean18)**2)*length(SA_day_18[SA_day_18==1]) +
((2-mean18)**2)*length(SA_day_18[SA_day_18==2]) + ((3-mean18)**2)*length(SA_day_18[SA_day_18==3]) +
((4-mean18)**2)*length(SA_day_18[SA_day_18==4]) + ((5-mean18)**2)*length(SA_day_18[SA_day_18==5]) +
((6-mean18)**2)*length(SA_day_18[SA_day_18==6]) + ((7-mean18)**2)*length(SA_day_18[SA_day_18==7]) +
((8-mean18)**2)*length(SA_day_18[SA_day_18==8]) + ((9-mean18)**2)*length(SA_day_18[SA_day_18==9]))/364
var19 = (((0-mean19)**2)*(366 - length(SA_day_19)) + ((1-mean19)**2)*length(SA_day_19[SA_day_19==1]) +
((2-mean19)**2)*length(SA_day_19[SA_day_19==2]) + ((3-mean19)**2)*length(SA_day_19[SA_day_19==3]) +
((4-mean19)**2)*length(SA_day_19[SA_day_19==4]) + ((5-mean19)**2)*length(SA_day_19[SA_day_19==5]) +
((6-mean19)**2)*length(SA_day_19[SA_day_19==6]) + ((7-mean19)**2)*length(SA_day_19[SA_day_19==7]) +
((8-mean19)**2)*length(SA_day_19[SA_day_19==8]) + ((9-mean19)**2)*length(SA_day_19[SA_day_19==9]))/364
var20 = (((0-mean20)**2)*(366 - length(SA_day_20)) + ((1-mean20)**2)*length(SA_day_20[SA_day_20==1]) +
((2-mean20)**2)*length(SA_day_16[SA_day_20==2]) + ((3-mean20)**2)*length(SA_day_20[SA_day_20==3]) +
((4-mean20)**2)*length(SA_day_16[SA_day_20==4]) + ((5-mean20)**2)*length(SA_day_20[SA_day_20==5]) +
((6-mean20)**2)*length(SA_day_16[SA_day_20==6]) + ((7-mean20)**2)*length(SA_day_20[SA_day_20==7]) +
((8-mean20)**2)*length(SA_day_16[SA_day_20==8]) + ((9-mean20)**2)*length(SA_day_20[SA_day_20==9]))/365
var21 = (((0-mean21)**2)*(366 - length(SA_day_21)) + ((1-mean21)**2)*length(SA_day_21[SA_day_21==1]) +
((2-mean21)**2)*length(SA_day_21[SA_day_21==2]) + ((3-mean21)**2)*length(SA_day_21[SA_day_21==3]) +
((4-mean21)**2)*length(SA_day_21[SA_day_21==4]) + ((5-mean21)**2)*length(SA_day_21[SA_day_21==5]) +
((6-mean21)**2)*length(SA_day_21[SA_day_21==6]) + ((7-mean21)**2)*length(SA_day_21[SA_day_21==7]) +
((8-mean21)**2)*length(SA_day_21[SA_day_21==8]) + ((9-mean21)**2)*length(SA_day_21[SA_day_21==9]))/364
print('The mean and variance of SA in 2016 are ')
mean16
var16
print('The mean and variance of SA in 2017 are ')
mean17
var17
print('The mean and variance of SA in 2018 are ')
mean18
var18
print('The mean and variance of SA in 2019 are ')
mean19
var19
print('The mean and variance of SA in 2020 are ')
mean20
var20
print('The mean and variance of SA in 2021 are ')
mean21
var21
DF_WM_16_19_RV <- DF_WM_16_19[DF_WM_16_19$Arrest.Category=='Release Violations/Fugitive',]
# table of date and the number of occurrences
RV_day_16_19 <- sapply(unique(DF_WM_16_19_RV$Arrest.Date),
function(x){sum(DF_WM_16_19_RV$Arrest.Date==x)})
| # of occurrences per day | Frequency | Relative frequency |
|---|---|---|
| 0 | 817 | 0.559206 |
| 1 | 428 | 0.29295 |
| 2 | 168 | 0.1149897 |
| 3 | 40 | 0.0273785 |
| 4 | 5 | 0.0034223 |
| 5 | 3 | 0.0020534 |
| 6 | 0 | 0 |
x_RV <- 0:6
y_RV <- c(817,428,168,40,5,3,0)
fx <- dpois(x=x_RV, lambda=sum(RV_day_16_19)/(365*4+1))
data_RV <- data.frame(x_RV, y_RV, fx)
ggplot(data_RV, aes(x=x_RV,y=y_RV)) +
ggtitle("Figure : Histogram of Release Violations/Fugitive in 2016 - 2019") +
xlab("Number of occurrences per day") +
ylab("Frequency") +
geom_bar(stat = "identity")
ggplot(data_RV) +
ggtitle("Figure : Relative frequency histogram of Release Violations/Fugitive in 2016 - 2019 \n and Poisson distribution with lambda = 0.629") +
xlab("Number of occurrences per day") +
ylab("Relative frequency") +
geom_bar(aes(x=x_RV,y=y_RV/(365*4+1)), stat = "identity") +
geom_line(aes(x=x_RV,y=fx), color='red') +
geom_point(aes(x=x_RV,y=fx), color='red')
DF_WM_20_21_RV <- DF_WM_20_21[DF_WM_20_21$Arrest.Category=='Release Violations/Fugitive',]
# table of date and the number of occurrences
RV_day_20_21 <- sapply(unique(DF_WM_20_21_RV$Arrest.Date),
function(x){sum(DF_WM_20_21_RV$Arrest.Date==x)})
| # of occurrences per day | Frequency | Relative frequency |
|---|---|---|
| 0 | 549 | 0.751026 |
| 1 | 154 | 0.2106703 |
| 2 | 24 | 0.0328317 |
| 3 | 4 | 0.005472 |
| 4 | 0 | 0 |
x_RV <- 0:4
y_RV <- c(549,154,24,4,0)
fx <- dpois(x=x_RV, lambda=sum(RV_day_20_21)/(365*2+1))
data_RV <- data.frame(x_RV, y_RV, fx)
ggplot(data_RV, aes(x=x_RV,y=y_RV)) +
ggtitle("Figure : Histogram of Release Violations/Fugitive in 2020 - 2021") +
xlab("Number of occurrences per day") +
ylab("Frequency") +
geom_bar(stat = "identity")
ggplot(data_RV) +
ggtitle("Figure : Relative frequency histogram of Release Violations/Fugitive in 2020 - 2021 \n and Poisson distribution with lambda = 0.293") +
xlab("Number of occurrences per day") +
ylab("Relative frequency") +
geom_bar(aes(x=x_RV,y=y_RV/(365*2+1)), stat = "identity") +
geom_line(aes(x=x_RV,y=fx), color='red') +
geom_point(aes(x=x_RV,y=fx), color='red')
x <- c('Before COVID-19','After COVID-19')
pre_covid_interval <- c(sum(RV_day_16_19)/(365*4+1) - 2.58*(sum(RV_day_16_19)/(365*4+1)/(356*4+1))**0.5,
sum(RV_day_16_19)/(365*4+1) + 2.58*(sum(RV_day_16_19)/(365*4+1)/(356*4+1))**0.5)
post_covid_interval <- c(sum(RV_day_20_21)/(365*2+1) - 2.58*(sum(RV_day_20_21)/(365*2+1)/(356*2+1))**0.5,
sum(RV_day_20_21)/(365*2+1) + 2.58*(sum(RV_day_20_21)/(365*2+1)/(356*2+1))**0.5)
data_CI_RV <- data.frame(x,pre_covid_interval,post_covid_interval)
ggplot(data_CI_RV) +
ggtitle("Figure : 99% Confidence Interval of lambda for Release Violations/Fugitive") +
ylab("99% Confidence Interval of lambda") +
xlab("") +
coord_flip() +
geom_linerange(aes(x=x[1], ymin=pre_covid_interval[1], ymax=pre_covid_interval[2])) +
geom_linerange(aes(x=x[2], ymin=post_covid_interval[1], ymax=post_covid_interval[2]))
DF_WM_16_19_DI <- DF_WM_16_19[DF_WM_16_19$Arrest.Category=='Driving/Boating While Intoxicated',]
# table of date and the number of occurrences
DI_day_16_19 <- sapply(unique(DF_WM_16_19_DI$Arrest.Date),
function(x){sum(DF_WM_16_19_DI$Arrest.Date==x)})
| # of occurrences per day | Frequency | Relative frequency |
|---|---|---|
| 0 | 875 | 0.5989049 |
| 1 | 416 | 0.2847365 |
| 2 | 138 | 0.0944559 |
| 3 | 25 | 0.0171116 |
| 4 | 6 | 0.0041068 |
| 5 | 1 | 6.844627^{-4} |
| 6 | 0 | 0 |
x_DI <- 0:6
y_DI <- c(875,416,138,25,6,1,0)
fx <- dpois(x=x_DI, lambda=sum(DI_day_16_19)/(365*4+1))
data_DI <- data.frame(x_DI, y_DI, fx)
ggplot(data_DI, aes(x=x_DI,y=y_DI)) +
ggtitle("Figure : Histogram of Driving/Boating While Intoxicated in 2016 - 2019") +
xlab("Number of occurrences per day") +
ylab("Frequency") +
geom_bar(stat = "identity")
ggplot(data_DI) +
ggtitle("Figure : Relative frequency histogram of Driving/Boating While Intoxicated in 2016 - 2019 \n and Poisson distribution with lambda = 0.545") +
xlab("Number of occurrences per day") +
ylab("Relative frequency") +
geom_bar(aes(x=x_DI,y=y_DI/(365*4+1)), stat = "identity") +
geom_line(aes(x=x_DI,y=fx), color='red') +
geom_point(aes(x=x_DI,y=fx), color='red')
DF_WM_20_21_DI <- DF_WM_20_21[DF_WM_20_21$Arrest.Category=='Driving/Boating While Intoxicated',]
# table of date and the number of occurrences
DI_day_20_21 <- sapply(unique(DF_WM_20_21_DI$Arrest.Date),
function(x){sum(DF_WM_20_21_DI$Arrest.Date==x)})
| # of occurrences per day | Frequency | Relative frequency |
|---|---|---|
| 0 | 527 | 0.7209302 |
| 1 | 168 | 0.2298222 |
| 2 | 29 | 0.0396717 |
| 3 | 5 | 0.0068399 |
| 4 | 2 | 0.002736 |
| 5 | 0 | 0 |
x_DI <- 0:5
y_DI <- c(527,168,29,5,2,0)
fx <- dpois(x=x_DI, lambda=sum(DI_day_20_21)/(365*2+1))
data_DI <- data.frame(x_DI, y_DI, fx)
ggplot(data_DI, aes(x=x_DI,y=y_DI)) +
ggtitle("Figure : Histogram of Driving/Boating While Intoxicated in 2020 - 2021") +
xlab("Number of occurrences per day") +
ylab("Frequency") +
geom_bar(stat = "identity")
ggplot(data_DI) +
ggtitle("Figure : Relative frequency histogram of Driving/Boating While Intoxicated in 2020 - 2021 \n and Poisson distribution with lambda = 0.341") +
xlab("Number of occurrences per day") +
ylab("Relative frequency") +
geom_bar(aes(x=x_DI,y=y_DI/(365*2+1)), stat = "identity") +
geom_line(aes(x=x_DI,y=fx), color='red') +
geom_point(aes(x=x_DI,y=fx), color='red')
x <- c('Before COVID-19','After COVID-19')
pre_covid_interval <- c(sum(DI_day_16_19)/(365*4+1) - 2.58*(sum(DI_day_16_19)/(365*4+1)/(356*4+1))**0.5,
sum(DI_day_16_19)/(365*4+1) + 2.58*(sum(DI_day_16_19)/(365*4+1)/(356*4+1))**0.5)
post_covid_interval <- c(sum(DI_day_20_21)/(365*2+1) - 2.58*(sum(DI_day_20_21)/(365*2+1)/(356*2+1))**0.5,
sum(DI_day_20_21)/(365*2+1) + 2.58*(sum(DI_day_20_21)/(365*2+1)/(356*2+1))**0.5)
data_CI_DI <- data.frame(x,pre_covid_interval,post_covid_interval)
ggplot(data_CI_DI) +
ggtitle("Figure : 99% Confidence Interval of lambda for Driving/Boating While Intoxicated") +
ylab("99% Confidence Interval of lambda") +
xlab("") +
coord_flip() +
geom_linerange(aes(x=x[1], ymin=pre_covid_interval[1], ymax=pre_covid_interval[2])) +
geom_linerange(aes(x=x[2], ymin=post_covid_interval[1], ymax=post_covid_interval[2]))
DF_WM_16_19_N <- DF_WM_16_19[DF_WM_16_19$Arrest.Category=='Narcotics',]
# table of date and the number of occurrences
N_day_16_19 <- sapply(unique(DF_WM_16_19_N$Arrest.Date),
function(x){sum(DF_WM_16_19_N$Arrest.Date==x)})
| # of occurrences per day | Frequency | Relative frequency |
|---|---|---|
| 0 | 1053 | 0.7207392 |
| 1 | 318 | 0.2176591 |
| 2 | 66 | 0.0451745 |
| 3 | 10 | 0.0068446 |
| 4 | 5 | 0.0034223 |
| 5 | 6 | 0.0041068 |
| 6 | 0 | 0 |
| 7 | 1 | 6.844627^{-4} |
| 8 | 0 | 0 |
| 9 | 1 | 6.844627^{-4} |
| 10 | 0 | 0 |
| 11 | 0 | 0 |
| 12 | 0 | 0 |
| 13 | 1 | 6.844627^{-4} |
| 14 | 0 | 0 |
x_N <- 0:14
y_N <- c(1053,318,66,10,5,6,0,1,0,1,0,0,0,1,0)
fx <- dpois(x=x_N, lambda=sum(N_day_16_19)/(365*4+1))
data_N <- data.frame(x_N, y_N, fx)
ggplot(data_N, aes(x=x_N,y=y_N)) +
ggtitle("Figure : Histogram of Narcotics in 2016 - 2019") +
xlab("Number of occurrences per day") +
ylab("Frequency") +
geom_bar(stat = "identity")
ggplot(data_N) +
ggtitle("Figure : Relative frequency histogram of Narcotics in 2016 - 2019 \n and Poisson distribution with lambda = 0.383") +
xlab("Number of occurrences per day") +
ylab("Relative frequency") +
geom_bar(aes(x=x_N,y=y_N/(365*4+1)), stat = "identity") +
geom_line(aes(x=x_N,y=fx), color='red') +
geom_point(aes(x=x_N,y=fx), color='red')
DF_WM_20_21_N <- DF_WM_20_21[DF_WM_20_21$Arrest.Category=='Narcotics',]
# table of date and the number of occurrences
N_day_20_21 <- sapply(unique(DF_WM_20_21_N$Arrest.Date),
function(x){sum(DF_WM_20_21_N$Arrest.Date==x)})
| # of occurrences per day | Frequency | Relative frequency |
|---|---|---|
| 0 | 652 | 0.8919289 |
| 1 | 66 | 0.0902873 |
| 2 | 10 | 0.0136799 |
| 3 | 3 | 0.004104 |
| 4 | 0 | 0 |
x_N <- 0:4
y_N <- c(652,66,10,3,0)
fx <- dpois(x=x_N, lambda=sum(N_day_20_21)/(365*2+1))
data_N <- data.frame(x_N, y_N, fx)
ggplot(data_N, aes(x=x_N,y=y_N)) +
ggtitle("Figure : Histogram of Narcotics in 2020 - 2021") +
xlab("Number of occurrences per day") +
ylab("Frequency") +
geom_bar(stat = "identity")
ggplot(data_N) +
ggtitle("Figure : Relative frequency histogram of Narcotics in 2020 - 2021 \n and Poisson distribution with lambda = 0.13") +
xlab("Number of occurrences per day") +
ylab("Relative frequency") +
geom_bar(aes(x=x_N,y=y_N/(365*2+1)), stat = "identity") +
geom_line(aes(x=x_N,y=fx), color='red') +
geom_point(aes(x=x_N,y=fx), color='red')
x <- c('Before COVID-19','After COVID-19')
pre_covid_interval <- c(sum(N_day_16_19)/(365*4+1) - 2.58*(sum(N_day_16_19)/(365*4+1)/(356*4+1))**0.5,
sum(N_day_16_19)/(365*4+1) + 2.58*(sum(N_day_16_19)/(365*4+1)/(356*4+1))**0.5)
post_covid_interval <- c(sum(N_day_20_21)/(365*2+1) - 2.58*(sum(N_day_20_21)/(365*2+1)/(356*2+1))**0.5,
sum(N_day_20_21)/(365*2+1) + 2.58*(sum(N_day_20_21)/(365*2+1)/(356*2+1))**0.5)
data_CI_N <- data.frame(x,pre_covid_interval,post_covid_interval)
ggplot(data_CI_N) +
ggtitle("Figure : 99% Confidence Interval of lambda for Narcotics") +
ylab("99% Confidence Interval of lambda") +
xlab("") +
coord_flip() +
geom_linerange(aes(x=x[1], ymin=pre_covid_interval[1], ymax=pre_covid_interval[2])) +
geom_linerange(aes(x=x[2], ymin=post_covid_interval[1], ymax=post_covid_interval[2]))
DF_WM_16_19_LV <- DF_WM_16_19[DF_WM_16_19$Arrest.Category=='Liquor Law Violations',]
# table of date and the number of occurrences
LV_day_16_19 <- sapply(unique(DF_WM_16_19_LV$Arrest.Date),
function(x){sum(DF_WM_16_19_LV$Arrest.Date==x)})
| # of occurrences per day | Frequency | Relative frequency |
|---|---|---|
| 0 | 1090 | 0.7460643 |
| 1 | 259 | 0.1772758 |
| 2 | 76 | 0.0520192 |
| 3 | 32 | 0.0219028 |
| 4 | 3 | 0.0020534 |
| 5 | 1 | 6.844627^{-4} |
| 6 | 0 | 0 |
x_LV <- 0:6
y_LV <- c(1090,259,76,32,3,1,0)
fx <- dpois(x=x_LV, lambda=sum(LV_day_16_19)/(365*4+1))
data_LV <- data.frame(x_LV, y_LV, fx)
ggplot(data_LV, aes(x=x_LV,y=y_LV)) +
ggtitle("Figure : Histogram of Liquor Law Violations in 2016 - 2019") +
xlab("Number of occurrences per day") +
ylab("Frequency") +
geom_bar(stat = "identity")
ggplot(data_LV) +
ggtitle("Figure : Relative frequency histogram of Liquor Law Violations in 2016 - 2019 \n and Poisson distribution with lambda = 0.359") +
xlab("Number of occurrences per day") +
ylab("Relative frequency") +
geom_bar(aes(x=x_LV,y=y_LV/(365*4+1)), stat = "identity") +
geom_line(aes(x=x_LV,y=fx), color='red') +
geom_point(aes(x=x_LV,y=fx), color='red')
DF_WM_20_21_LV <- DF_WM_20_21[DF_WM_20_21$Arrest.Category=='Liquor Law Violations',]
# table of date and the number of occurrences
LV_day_20_21 <- sapply(unique(DF_WM_20_21_LV$Arrest.Date),
function(x){sum(DF_WM_20_21_LV$Arrest.Date==x)})
| # of occurrences per day | Frequency | Relative frequency |
|---|---|---|
| 0 | 699 | 0.9562244 |
| 1 | 27 | 0.0369357 |
| 2 | 4 | 0.005472 |
| 3 | 1 | 0.001368 |
| 4 | 0 | 0 |
x_LV <- 0:4
y_LV <- c(699,27,4,1,0)
fx <- dpois(x=x_LV, lambda=sum(LV_day_20_21)/(365*2+1))
data_LV <- data.frame(x_LV, y_LV, fx)
ggplot(data_LV, aes(x=x_LV,y=y_LV)) +
ggtitle("Figure : Histogram of Liquor Law Violations in 2020 - 2021") +
xlab("Number of occurrences per day") +
ylab("Frequency") +
geom_bar(stat = "identity")
ggplot(data_LV) +
ggtitle("Figure : Relative frequency histogram of Liquor Law Violations in 2020 - 2021 \n and Poisson distribution with lambda = 0.052") +
xlab("Number of occurrences per day") +
ylab("Relative frequency") +
geom_bar(aes(x=x_LV,y=y_LV/(365*2+1)), stat = "identity") +
geom_line(aes(x=x_LV,y=fx), color='red') +
geom_point(aes(x=x_LV,y=fx), color='red')
x <- c('Before COVID-19','After COVID-19')
pre_covid_interval <- c(sum(LV_day_16_19)/(365*4+1) - 2.58*(sum(LV_day_16_19)/(365*4+1)/(356*4+1))**0.5,
sum(LV_day_16_19)/(365*4+1) + 2.58*(sum(LV_day_16_19)/(365*4+1)/(356*4+1))**0.5)
post_covid_interval <- c(sum(LV_day_20_21)/(365*2+1) - 2.58*(sum(LV_day_20_21)/(365*2+1)/(356*2+1))**0.5,
sum(LV_day_20_21)/(365*2+1) + 2.58*(sum(LV_day_20_21)/(365*2+1)/(356*2+1))**0.5)
data_CI_LV <- data.frame(x,pre_covid_interval,post_covid_interval)
ggplot(data_CI_LV) +
ggtitle("Figure : 99% Confidence Interval of lambda for Liquor Law Violations") +
ylab("99% Confidence Interval of lambda") +
xlab("") +
coord_flip() +
geom_linerange(aes(x=x[1], ymin=pre_covid_interval[1], ymax=pre_covid_interval[2])) +
geom_linerange(aes(x=x[2], ymin=post_covid_interval[1], ymax=post_covid_interval[2]))